-
Notifications
You must be signed in to change notification settings - Fork 25.6k
[ML] Limit in flight requests when indexing model download parts #112992
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Pinging @elastic/ml-core (Team:ML) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM for the newer changes. Although, ideally there should be a test that shows the number of parallel downloads is limited. In any case, I'm gonna leave the final LGTM to ML, if that's ok.
| true | ||
| ); | ||
|
|
||
| client.execute(PutTrainedModelDefinitionPartAction.INSTANCE, modelPartRequest).actionGet(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the part that is blocking and/or slowing down the indexing, correct? We'll wait for the client response rather than continue asynchronously?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The gain is from using 5 threads to stream the download and index the parts. The non-blocking write meant that we had more than 5 in flight requests (the download is faster than the indexing) and that was causing the OOM. In order to limit the number of requests to at most 5 there has to be some element of blocking. Model download uses it's a dedicated thread pool so the block does not starve other parts of the code of resources
| for (int i = 0; i < ranges.size() - 1; i++) { | ||
| assertThat(ranges.get(i).rangeStart(), is(startBytes)); | ||
| long end = startBytes + ((long) ranges.get(i).numParts() * chunkSize) - 1; | ||
| assertThat(ranges.get(i).rangeEnd(), is(end)); | ||
| long expectedNumBytesInRange = (long) chunkSize * ranges.get(i).numParts() - 1; | ||
| assertThat(ranges.get(i).rangeEnd() - ranges.get(i).rangeStart(), is(expectedNumBytesInRange)); | ||
| assertThat(ranges.get(i).startPart(), is(startPartIndex)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice
|
@elasticmachine update branch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
…stic#112992) Restores the changes from elastic#111684 which uses multiple streams to improve the time to download and install the built in ml models. The first iteration has a problem where the number of in-flight requests was not properly limited which is fixed here. Additionally there are now circuit breaker checks on allocating the buffer used to store the model definition.
💔 Backport failed
You can use sqren/backport to manually backport by running |
…2992) (#113514) Restores the changes from #111684 which uses multiple streams to improve the time to download and install the built in ml models. The first iteration has a problem where the number of in-flight requests was not properly limited which is fixed here. Additionally there are now circuit breaker checks on allocating the buffer used to store the model definition.
…stic#112992) Restores the changes from elastic#111684 which uses multiple streams to improve the time to download and install the built in ml models. The first iteration has a problem where the number of in-flight requests was not properly limited which is fixed here. Additionally there are now circuit breaker checks on allocating the buffer used to store the model definition. # Conflicts: # x-pack/plugin/ml-package-loader/src/main/java/org/elasticsearch/xpack/ml/packageloader/action/TransportLoadTrainedModelPackage.java # x-pack/plugin/ml-package-loader/src/test/java/org/elasticsearch/xpack/ml/packageloader/action/TransportLoadTrainedModelPackageTests.java
…2992) (#113710) Restores the changes from #111684 which uses multiple streams to improve the time to download and install the built in ml models. The first iteration has a problem where the number of in-flight requests was not properly limited which is fixed here. Additionally there are now circuit breaker checks on allocating the buffer used to store the model definition. # Conflicts: # x-pack/plugin/ml-package-loader/src/main/java/org/elasticsearch/xpack/ml/packageloader/action/TransportLoadTrainedModelPackage.java # x-pack/plugin/ml-package-loader/src/test/java/org/elasticsearch/xpack/ml/packageloader/action/TransportLoadTrainedModelPackageTests.java
#111684 improved the model install time by using multiple streams and threads to download and write the model parts. The change was reverted in #112961 after it was discovered the be the cause of Out Of Memory exceptions.
The design relied on using a fixed size thread pool to limit the concurrent downloads and hence also manage memory usage. However, the indexing of the downloaded document was performed async which meant a new download request would be forked and executed while the write request was still in flight leading to large numbers of in flight requests. The fix here is to block on the index write.
The first commit is the revert of the revert, the later commits introduce the blocking write and reuse a byte buffer that was being recreated for every downloaded part. Allocating that byte buffer is now protected by a circuit breaker.
Labelled as a non issue because the code that caused the OOM was reverted before it made it to a production environment.